Realization of high-performance bilingual English/Arabic articulated document analysis and understanding system
Identifieur interne : 001774 ( Main/Exploration ); précédent : 001773; suivant : 001775Realization of high-performance bilingual English/Arabic articulated document analysis and understanding system
Auteurs : A. I. El Desouky [Égypte] ; A. O. Abd El Gwad ; H. Arafat AliSource :
- International Journal of Computer Applications in Technology [ 0952-8091 ] ; 2003.
Descripteurs français
- Pascal (Inist)
English descriptors
- KwdEn :
Abstract
Document analysis and understanding (DAU) systems aim not only at the recognition of text but also at the extraction of relevant information out of a scanned document. Numerous studies have introduced efficient algorithms for document analysis, some of these studies proposed an articulated and/or a translation stage. Also little work has been done in the English/Arabic (E/A) translation area. The main objective of this paper is to introduce a combination between the three trends (document understanding, E/A translation, and handling the output in articulated voice). This paper focuses on the realization of a bilingual articulated E/A system based on optical character recognition (OCR). The input of the proposed system will be an English or Arabic text through scanner or video camera; the output will be an articulated voice. The proposed scheme consists of two phases. The first phase consists of many processes, beginning with converting scanned document into an electronic processable form. Then in the segmentation step, the essential problem in Arabic OCR, that is, how to cope with the various shapes of the same character, is solved. A new methodology for segmenting Arabic characters is presented. At the end of this phase an efficient method of text recognition based on hybrid description (ANN, FFT) is used. In order to verify the performance of this phase, experiments with printed text were performed. The error rates were less than 0.1%. Results showed that the proposed scheme in this phase is very robust. In the second phase, a database with 3000 audio files was encapsulated to convert each word from the input text (the output of the first phase) to its correspondence. This research can help in many real-time applications such as: (immediate translation, machine reader for blind people, learning...).
Affiliations:
Links toward previous steps (curation, corpus...)
- to stream PascalFrancis, to step Corpus: 000642
- to stream PascalFrancis, to step Curation: 000149
- to stream PascalFrancis, to step Checkpoint: 000538
- to stream Main, to step Merge: 001852
- to stream Main, to step Curation: 001774
Le document en format XML
<record><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="en" level="a">Realization of high-performance bilingual English/Arabic articulated document analysis and understanding system</title>
<author><name sortKey="El Desouky, A I" sort="El Desouky, A I" uniqKey="El Desouky A" first="A. I." last="El Desouky">A. I. El Desouky</name>
<affiliation wicri:level="1"><inist:fA14 i1="01"><s1>Information Systems Department Faculty of Comp. and Info. Systems Mansoura University</s1>
<s2>Mansoura</s2>
<s3>EGY</s3>
<sZ>1 aut.</sZ>
</inist:fA14>
<country>Égypte</country>
<wicri:noRegion>Information Systems Department Faculty of Comp. and Info. Systems Mansoura University</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Abd El Gwad, A O" sort="Abd El Gwad, A O" uniqKey="Abd El Gwad A" first="A. O." last="Abd El Gwad">A. O. Abd El Gwad</name>
</author>
<author><name sortKey="Arafat Ali, H" sort="Arafat Ali, H" uniqKey="Arafat Ali H" first="H." last="Arafat Ali">H. Arafat Ali</name>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">INIST</idno>
<idno type="inist">03-0003300</idno>
<date when="2003">2003</date>
<idno type="stanalyst">PASCAL 03-0003300 EI</idno>
<idno type="RBID">Pascal:03-0003300</idno>
<idno type="wicri:Area/PascalFrancis/Corpus">000642</idno>
<idno type="wicri:Area/PascalFrancis/Curation">000149</idno>
<idno type="wicri:Area/PascalFrancis/Checkpoint">000538</idno>
<idno type="wicri:doubleKey">0952-8091:2003:El Desouky A:realization:of:high</idno>
<idno type="wicri:Area/Main/Merge">001852</idno>
<idno type="wicri:Area/Main/Curation">001774</idno>
<idno type="wicri:Area/Main/Exploration">001774</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="en" level="a">Realization of high-performance bilingual English/Arabic articulated document analysis and understanding system</title>
<author><name sortKey="El Desouky, A I" sort="El Desouky, A I" uniqKey="El Desouky A" first="A. I." last="El Desouky">A. I. El Desouky</name>
<affiliation wicri:level="1"><inist:fA14 i1="01"><s1>Information Systems Department Faculty of Comp. and Info. Systems Mansoura University</s1>
<s2>Mansoura</s2>
<s3>EGY</s3>
<sZ>1 aut.</sZ>
</inist:fA14>
<country>Égypte</country>
<wicri:noRegion>Information Systems Department Faculty of Comp. and Info. Systems Mansoura University</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Abd El Gwad, A O" sort="Abd El Gwad, A O" uniqKey="Abd El Gwad A" first="A. O." last="Abd El Gwad">A. O. Abd El Gwad</name>
</author>
<author><name sortKey="Arafat Ali, H" sort="Arafat Ali, H" uniqKey="Arafat Ali H" first="H." last="Arafat Ali">H. Arafat Ali</name>
</author>
</analytic>
<series><title level="j" type="main">International Journal of Computer Applications in Technology</title>
<title level="j" type="abbreviated">Int J Comput Appl Technol</title>
<idno type="ISSN">0952-8091</idno>
<imprint><date when="2003">2003</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt><title level="j" type="main">International Journal of Computer Applications in Technology</title>
<title level="j" type="abbreviated">Int J Comput Appl Technol</title>
<idno type="ISSN">0952-8091</idno>
</seriesStmt>
</fileDesc>
<profileDesc><textClass><keywords scheme="KwdEn" xml:lang="en"><term>Algorithms</term>
<term>Computer aided language translation</term>
<term>Document analysis</term>
<term>Experiments</term>
<term>Optical character recognition</term>
<term>Pattern recognition</term>
<term>Real time systems</term>
<term>Scanning</term>
<term>Theory</term>
</keywords>
<keywords scheme="Pascal" xml:lang="fr"><term>Théorie</term>
<term>Traduction langage assistée</term>
<term>Reconnaissance forme</term>
<term>Balayage</term>
<term>Système temps réel</term>
<term>Algorithme</term>
<term>Reconnaissance optique caractère</term>
<term>Expérience</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">Document analysis and understanding (DAU) systems aim not only at the recognition of text but also at the extraction of relevant information out of a scanned document. Numerous studies have introduced efficient algorithms for document analysis, some of these studies proposed an articulated and/or a translation stage. Also little work has been done in the English/Arabic (E/A) translation area. The main objective of this paper is to introduce a combination between the three trends (document understanding, E/A translation, and handling the output in articulated voice). This paper focuses on the realization of a bilingual articulated E/A system based on optical character recognition (OCR). The input of the proposed system will be an English or Arabic text through scanner or video camera; the output will be an articulated voice. The proposed scheme consists of two phases. The first phase consists of many processes, beginning with converting scanned document into an electronic processable form. Then in the segmentation step, the essential problem in Arabic OCR, that is, how to cope with the various shapes of the same character, is solved. A new methodology for segmenting Arabic characters is presented. At the end of this phase an efficient method of text recognition based on hybrid description (ANN, FFT) is used. In order to verify the performance of this phase, experiments with printed text were performed. The error rates were less than 0.1%. Results showed that the proposed scheme in this phase is very robust. In the second phase, a database with 3000 audio files was encapsulated to convert each word from the input text (the output of the first phase) to its correspondence. This research can help in many real-time applications such as: (immediate translation, machine reader for blind people, learning...).</div>
</front>
</TEI>
<affiliations><list><country><li>Égypte</li>
</country>
</list>
<tree><noCountry><name sortKey="Abd El Gwad, A O" sort="Abd El Gwad, A O" uniqKey="Abd El Gwad A" first="A. O." last="Abd El Gwad">A. O. Abd El Gwad</name>
<name sortKey="Arafat Ali, H" sort="Arafat Ali, H" uniqKey="Arafat Ali H" first="H." last="Arafat Ali">H. Arafat Ali</name>
</noCountry>
<country name="Égypte"><noRegion><name sortKey="El Desouky, A I" sort="El Desouky, A I" uniqKey="El Desouky A" first="A. I." last="El Desouky">A. I. El Desouky</name>
</noRegion>
</country>
</tree>
</affiliations>
</record>
Pour manipuler ce document sous Unix (Dilib)
EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 001774 | SxmlIndent | more
Ou
HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 001774 | SxmlIndent | more
Pour mettre un lien sur cette page dans le réseau Wicri
{{Explor lien |wiki= Ticri/CIDE |area= OcrV1 |flux= Main |étape= Exploration |type= RBID |clé= Pascal:03-0003300 |texte= Realization of high-performance bilingual English/Arabic articulated document analysis and understanding system }}
This area was generated with Dilib version V0.6.32. |